Information erasure lurking behind measures of complexity 
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Complex systems are found in most branches ol science. It is still argued how to best quantify 
their complexity and to what end. One prominent measure of complexity (the statistical complexity) 
has an operational meaning in terms of the amount of resources needed to forecasting a system's 
behaviour. Another one (the effective measure complexity, aka excess entropy) is a measure of 
mutual information stored in the system proper. We show that for any given system the two 
measures differ by the amount of information erased during forecasting. We interpret the difference 
as inefficiency of a given model. We find a bound to the ratio of the two measures defined as 
information-processing efficiency, in analogy to the second law of thermodynamics. This new link 
between two prominent measures of complexity provides a quantitative criterion for good models of 
complex systems, namely those with little information erasure. 
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The idea of physics as information has a long history. 
The concept of entropy, at the heart of information the- 
ory, originated in the theory of thermodynamics. It was 
Maxwell and Boltzmann who, in the beginning of the 19*'' 
century, recognized the intricate link between probabil- 
ity distributions over configurations and thermodynam- 
ics. This laid the foundation to the field of statistical 
mechanics. The similarity between the thermodynamic 
entropy and the information entropy, introduced in 1948 
by Shannon, lead to a whole new perspective on phys- 
ical processes as storing and processing information. It 
also lead to paradoxes such as Maxwell's demon which 
seemed to suggest that work could be generated from 
heat only with the use of information, which would vio- 
late the second law of thermodynamics (for a review, see 
Refs. [E Hi)- The paradox was solved independently by 
Penrose and Bennett, in considering the entropy creation 
caused by erasing information [3, 0] . Since the insight 
that information is physical and physics is information 
one has started to regard nature as a grand information 
processor of both classical and quantum information (see 
e.g. Refs. 0, Q). This point of view is especially fruit- 
ful in the study of complex systems. Here, the physical 
laws are often missing. Other means of modelling a sys- 
tem's behaviour have to be found. Information theory 
comes in handy. It provides the tools for distinguishing 
structure from randomness in a given data set Q - a dis- 
tinction which is the basis of any reasonable model 
Complexity lies between disorder and order. Hence, a 
good measure of complexity is zero for both completely 
random objects and trivially ordered objects. 

In the following we will concentrate on two computable 



measures of complexity: The statistical complexity [8|, |9[ 
and the effective measure complexity [lo| , aka excess en- 
tropy 0. The excess entropy measures the internal in- 
formation of a process which is communicated from the 
past to the future. The statistical complexity, on the 
other hand, measures the amount of information required 
to predict the process. Hence, it is a property more of 
the model than the process. Any inconsistency between 
the two is puzzling at first. One should not require more 
information to model a process than the process itself 
uses. The central and new result of this paper is that 
this inconsistency can be explained with information era- 
sure. This allows for a direct computation of the excess 
entropy from the statistical complexity, which had not 
been possible before. Until now the excess entropy had 
to be computed numerically Q or analytically via a com- 
putationally expensive procedure ll|. We also obtain a 



simple interpretation of the difference between the two 
measures as the information-processing efficiency of the 
model used for prediction. We find that this efficiency 
is bounded in the same way the generation of work is 
bounded by the second law of thermodynamics. 

Many proposals for complexity measures exist, such as 
effective measure complexity [lO], statistical complexity 
Q, logical dept h [13 . thermodynamic depth [5], effec- 
tive complexity [l3|, to mention a few. Some measure 
structure as expected [l^ljSome only randomness which 
renders them unsuitable [4| (se e [14j). Some are com- 
putable others are not And most often they 
are completely unrelated quantities. If one is interested 
in measures which capture complexity and not random- 
ness and which, in addition, are computable then one is 
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left with the mutual information and related quantities 
17 1 and the statistical complexity 0, Q. 



la 15, h 



by Tononi et al. 15|, [la]- It has recently been shown 



Therefore, these are the measures we focus on in this 
paper. 

Consider a complex system which is studied in a se- 
quence of observations. We call the time-independent 
probability distribution Pr( S ) over such a (infinite) se- 
quence of observations S a stationary process. The 
framework of computational mechanics provides the tools 
to infer a computation-theoretic model of the process, 
which is provably the most compact description while re- 
producing the statistics exactly 8, 9]. This computation- 
theoretic model, called e-machine, consists of an out- 
put alphabet A, a set of causal states S and stochastic 
transitions between them. For every pair of states 55', 
Pr(iS'|5a) gives the probability of going from state S to 
state S' while outputting symbol a G A. The statistical 
complexity of a process is the Shannon entropy over the 
causal-state distribution of its e-machine 261 : 



= H{S) 



(1) 



measures the minimum number of bits that need 
to be stored to optimally predict a given process. The 
e-machine is a minimal and optimal predictor of the pro- 
cess. Fig. [T] shows an example of an e-machine. At any 
given point in time the machine is in one of its states. It 
choses the next state according to the transition probabil- 
ities. Once the transition is taken the label of the corre- 
sponding edge will be put out as symbol. This procedure 
repeats indefinitely. Successful applications of computa- 
tional mechanics range from dynamical systems Q, spin 
systems [l3|, and crystallographic data Il9| to molecu- 
lar dynamics l20| , atmospheric turbulence [2l|, and self- 
organisation |22| . 

The second measure of complexity under consideration 
is the effective-measure complexity EMC introduced by 
Grassberger [l3|. It is based on the entropy rate h of 
a process, defined as the increase in Shannon entropy 
over increasing block sizes s^: Hl = H{s^)/L. The limit 
h = limi^oo hL is a measure of information production 
or unpredictability. Completely periodic processes, for 
example, have zero entropy rate [6|]. EMC is defined as 
EMC — Yl^=o{^L — h). Equivalent to it is the excess 
entropy E which, for our purposes, is more conveniently 
defined 0. The excess entropy of a process is the av- 
erage mutual information between its semi-infinite past 
and semi-infinite future: 



E = /[5; S] . 



(2) 



Note, that E — EMC. It can easily be shown that the 
excess entropy is a lower bound of the statistical com- 
plexity [§| [Theorem 5]: E < C^. The excess entropy 
is the average number of bits a process stores about its 
past and transmits into the future. The effective mea- 
sure complexity and the excess entropy can be consid- 
ered as special cases of a complexity measure introduced 



that the excess entropy can be interpreted as the mutual 
information between a process's predictive and retrod- 
ictive causal states ll|. This result also led to a first 



closed-form expression of the excess entropy. The com- 
putation of that expression, however, requires the infer- 
ence of both the predictive and retrodictive causal states, 
which is computationally very costly. We now derive a 
simple expression for calculating the excess entropy of a 
process from the predictive causal states only. 

Once an e-machine is given, there are several quantities 
of interest. The average amount of information stored is 
given by C^. The average amount of information gener- 
ated is given by the entropy rate h. For finite-state ma- 
chines like the e-machine it can be calculated in closed 
form as the uncertainty of the next symbol given the last 
state h = H(a\S) [2^. The quantity which had not been 
considered in this context is the amount of information 
which is erased at each time step. We now show how to 
calculate it and its significance for characterising a model 
of a complex system. 

Landauer defines an operation to be logically irre- 
versible if the output of the operation does not uniquely 
define the inputs 2J|. In other words, logically irre- 



versible operations erase information about the com- 
puter's preceding logical state. How can we apply this 
logical irreversibility to the e-machine of a process? e- 
machines are deterministic in the sense that the cur- 
rent state and the next symbol determine the next state 
uniquely. The reverse, however, is not necessarily true. 
Given the next state and the last symbol, the previous 
state is not always uniquely determined. If it is not the e- 
machine is logically irreversible. The information erased 
at each step lerased is given by the entropy over the last 
state given the last symbol. lerased can be calculated 
from the e-machine as follows. Given the triplet 'last 
state, symbol, next state' SaS' , the amount of informa- 
tion erased is as follows: 



^erased 



= H{Sa) - H{aS') 



(3) 



This uncertainty of the previous state given that the sym- 
bol and next state are known quantifies the irreversibility 
of the e-machine. We can now derive an exact relation- 
ship between statistical complexity and excess entropy of 
a process. 

Theorem. The difference between statistical complexity 
and excess entropy is given by the logical irreversibility of 
the e-machine: 



Ca 



E 



(4) 



Proof. From the definition of the excess entropy in terms 
of mutual information we arrive at the following expres- 
sion for E: 



3 



F, = I[S\S] = H{S)- H{S\S) = H{S) - H{S\S) 
= HiS) - HiS\~S) ^C^- H{S\S) . 




To prove the Theorem we have to show H(S \ S) — 
lerased- Since knowledge of the entire future starting 
with symbol a eventually uniquely determines the next 
state following on a, we have H{S'\ S) — (note the dif- 
ference between uncertainty in last state, which can be 
non- vanishing, and next state). Now we can write 

H{S\'S) = H{S\~S) - H{S'\~S) = H{S~S) - H{S'^) 

= H{S-'^\Sa) + H{Sa) - H(S-^\aS') - H{aS') 

= H{S-'^\S') + H{Sa) - H{S-^\S') - H{aS') 
= H{Sa) - H{aS') 

The superscript —1 indicates the removal of the first 
or last symbol, respectively. In the next to last step, we 
used the fact that e-machines are deterministic, which 
means i/(5'|5a) = 0. □ 

This leads to the corollary that for reversible e- 
machines E = C^. In analogy to thermodynamical ef- 
ficiency we define the information-processing efficiency l 
of an e-machine as the ratio between excess entropy and 
statistical complexity: 



E ^ ^erased 



(5) 



From Eq. [3] we find an upper bound on the amount of 
information which can be erased at each step. We can 
rewrite Eq. [3]in the following way: 



H{a\S)-H{a\S')^h-H{a\S') , (6) 



where we used the fact that the entropy over last state 
S and next state S' are the same. H{a\S') is the uncer- 
tainty of the last symbol given the next state. Hence, we 
obtain an upper bound for the cost of erasure as 



ed <h 



(7) 



Thus, the amount of information which can be erased in 
the causal-state model is upper bounded by the amount 
of information which is created, as one would expect. 

Let us illustrate the Theorem with two example pro- 
cesses. The first one generates single I's surrounded by 
O's. No consecutive ones are allowed. The process is 
called Golden Mean Process, and its e-machine is shown 
in FiglT] The amount of irreversibility of this e-machine 
is illustrated in Table HI The information erased at each 
time step can be calculated from the transitions from 
'last state , symbol' to 'symbol , next state'. 



FIG. 1: Graphical representation of the Golden Mean Process. 
The causal states S are labeled A and B, edges are labelled 
with Pr(5'|<Sa) | a. The Golden Mean Process is irreversible: 
L = 0.27. 



TABLE L Transition diagram: 'last state , symbol' to 'symbol 
, next state' 

Sa aS' 



AO 
BO 
Al 



OA 
IB 



Whenever state A is entered (on symbol 0) the last 
state is maximally uncertain. The amount of irreversibil- 
ity can now be calculated from the transition proba- 
bilities shown in Fig. [T] and using Eq. [H The statis- 
tical complexity of the Golden Mean Process is = 
i/(2/3, 1/3) — 0.9183 bits, which yields an excess en- 
tropy E — 0.2516 bits. The known numerical value for 
the excess entropy of this process was 0.252 bits [6]. h 
in this example is 2/3 bits. The Golden Mean Process 
reaches the upper bound of erased information, Eq. [6l 
since the uncertainty of the last symbol given the next 
state H{a\S') is zero. 

The second example process generates even sequences 
of I's surrounded by O's. This process is interesting be- 
cause it has an infinite list of forbidden words. The previ- 
ous process had but one forbidden word, two consecutive 
Is (and all words containing it). We can see from the 
graph in Fig.[2]that the corresponding e-machine is com- 
pletely reversible. This is confirmed in Table |TT] listing 
the possible transitions. Whenever the state and the last 



TABLE IL Transition diagram: 'last state , symbol' to 'sym- 
bol , next state' 

Sa aS' 



AO 
BO 
Al 



OA 
OA 
IB 



symbol are known, the last state can be uniquely traced. 
Hence, according to the Theorem, the statistical com- 
plexity is equal to the excess entropy and we obtain for 
the Even Process: = E = 0.9183 bits. The numeri- 
cal value known for E was 0.902 bits and was known to 
converge very slowly ^] . 

The Theorem allows us to see the fact that E is a lower 
bound of C^t in new light. If there existed a process with 
Cu < E it would generate more negative entropy than 
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FIG. 2: Graphical representation of the Even Process. Edges 
are labeUed Pr(iS'|<Sa) | a. The Even Process is reversible and 
thus maximally efficient: l — 1. 



theoretic representations of complex systems like the e- 
machine. 
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it uses. We consider this an information-theoretic ana- 
logue to the second law of thermodynamics for models 
of complex systems. No optimally predictive model can 
store less than E bits. On the other hand, any predic- 
tive model of a complex system needs to store at least 
E bits of information. Anything beyond that must be 
considered inefficient. This point of view corroborates 
the original interpretation of the effective-measure com- 
plexity given by Grassberger who noted that EMC is the 
amount of memory required to optimally predict the fu- 
ture of a process if one could use it to 100% efficiency 

This leaves us with a new puzzle: How can the e- 
machine - provably minimal and optimal given that one 
operates on discrete states and variables - be inefficient? 
There is the possibility that an irreversible e-machine in- 
dicates that the process itself is information-theoretically 
inefficient. If, on the other hand, it implies a model- 
inefficiency, the model can only be improved by assuming 
different kinds of resources, such as a different architec- 
ture, continuous states or other. Indeed, there are cases 
where the efficiency of a model is improved when the 
states are allowed to have quantum mechanical properties 
[2^. It is also possible that one has to consider a trade- 
off between the size of the model and its irreversibility. 
We conclude that an e-machine's logical irreversibility 
can potentially be used as a criterion for model discrim- 
ination, both in terms of size and in terms of architec- 
ture. Measuring the complexity of a process can now be 
motivated by the search for an information-theoretically 
efficient model. 

To summarise, we have applied Landauer's logical irre- 
versibility to models of complex systems. We showed that 
the difference between the statistical complexity and the 
excess entropy of a complex system is given by the log- 
ical irreversibility of its e-machine. Our result provides 
a means to quantitatively discriminate between models 
of complex systems. The here presented information- 
theoretic analogue to the second law of thermodynamics 
requires that any model requires at least E bits of stored 
information. Furthermore, if one takes the e-machine as 
the starting point for a physical model of the system, 
the nature of resources which minimise the information 
erased becomes physically meaningful. The here pre- 
sented results motivate further studies of computation- 
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